Skip to content

feat(BA-5373): Add blue-green deployment infrastructure and promote API#10426

Draft
jopemachine wants to merge 12 commits intomainfrom
BA-3436-promote-api
Draft

feat(BA-5373): Add blue-green deployment infrastructure and promote API#10426
jopemachine wants to merge 12 commits intomainfrom
BA-3436-promote-api

Conversation

@jopemachine
Copy link
Copy Markdown
Member

@jopemachine jopemachine commented Mar 23, 2026

Resolves BA-5373.

Summary

  • Add DeployingAwaitingPromotionHandler for blue-green AWAITING_PROMOTION sub-step processing
  • Add promoteDeployment GraphQL mutation for manual blue-green promotion
  • Add promote_deployment repository method with atomic route switch (promote green → ACTIVE, drain blue → TERMINATING, swap revision)
  • Wire promote through full stack: DTO → Action → Service → Processor → Adapter → GQL
  • Add promote_route_ids to RouteChanges for blue-green traffic switch
  • Add DEPLOYING_AWAITING_PROMOTION to DeploymentLifecycleSubStep

Context

This PR provides the infrastructure layer for the blue-green deployment strategy (BA-3436). The core strategy FSM (BlueGreenStrategy) is in a stacked PR on top of this one.

Test Plan

  • Existing deployment coordinator tests pass
  • ruff lint/format passes

🤖 Generated with Claude Code


📚 Documentation preview 📚: https://sorna--10426.org.readthedocs.build/en/10426/


📚 Documentation preview 📚: https://sorna-ko--10426.org.readthedocs.build/ko/10426/

Copilot AI review requested due to automatic review settings March 23, 2026 10:34
@github-actions github-actions Bot added size:XL 500~ LoC area:docs Documentations comp:manager Related to Manager component comp:common Related to Common component labels Mar 23, 2026
@jopemachine jopemachine changed the title feat(BA-3436): Add blue-green deployment infrastructure and promote API feat(BA-5373): Add blue-green deployment infrastructure and promote API Mar 23, 2026
@jopemachine jopemachine added this to the 26.4 milestone Mar 23, 2026
@jopemachine jopemachine marked this pull request as draft March 23, 2026 10:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds infrastructure support for blue-green deployments by introducing an AWAITING_PROMOTION sub-step handler and wiring a manual “promote deployment” operation end-to-end (service → repository → GraphQL), including atomic route traffic switching and revision swap.

Changes:

  • Added DEPLOYING_AWAITING_PROMOTION sub-step and a new DeployingAwaitingPromotionHandler to support the pause-before-promotion phase.
  • Added promoteDeployment GraphQL mutation (DTOs, adapter, action, processor, service) for manual promotion.
  • Implemented DeploymentRepository.promote_deployment() and extended strategy mutation plumbing to support “promote” route updates in the DB transaction.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/ai/backend/manager/sokovan/deployment/handlers/deploying.py Adds AWAITING_PROMOTION handler and adjusts DEPLOYING/PROVISIONING behavior.
src/ai/backend/manager/sokovan/deployment/handlers/init.py Exports the new deploying handler.
src/ai/backend/manager/sokovan/deployment/coordinator.py Registers the new DEPLOYING/AWAITING_PROMOTION handler.
src/ai/backend/manager/services/deployment/service.py Adds promote_deployment() service method and route classification logic.
src/ai/backend/manager/services/deployment/processors.py Wires the new promote action into processors/supported actions.
src/ai/backend/manager/services/deployment/actions/revision_operations/promote_deployment.py Introduces the promote action + result types.
src/ai/backend/manager/services/deployment/actions/revision_operations/init.py Exports the promote action types.
src/ai/backend/manager/repositories/deployment/repository.py Adds promote_deployment() and extends apply_strategy_mutations() signature to include promote.
src/ai/backend/manager/repositories/deployment/db_source/db_source.py Executes promote route updates as part of strategy mutation transaction.
src/ai/backend/manager/data/deployment/types.py Adds DEPLOYING_AWAITING_PROMOTION to lifecycle sub-steps list.
src/ai/backend/manager/api/gql/schema.py Registers promote_deployment mutation.
src/ai/backend/manager/api/gql/deployment/types/revision.py Adds GraphQL input/payload types for promotion.
src/ai/backend/manager/api/gql/deployment/types/init.py Exports promotion input/payload GraphQL types.
src/ai/backend/manager/api/gql/deployment/resolver/revision.py Adds the promote_deployment mutation resolver.
src/ai/backend/manager/api/gql/deployment/resolver/init.py Exports the new resolver symbol.
src/ai/backend/manager/api/gql/deployment/init.py Re-exports new GraphQL types and resolver.
src/ai/backend/manager/api/adapters/deployment.py Adds adapter method to trigger the promote action.
src/ai/backend/common/dto/manager/v2/deployment/response.py Adds PromoteDeploymentPayload DTO.
src/ai/backend/common/dto/manager/v2/deployment/request.py Adds PromoteDeploymentInput DTO.
docs/manager/graphql-reference/v2-schema.graphql Documents the new mutation and input/payload types (also includes an unrelated schema change).
docs/manager/graphql-reference/supergraph.graphql Same as above for the supergraph schema reference.
Comments suppressed due to low confidence (2)

docs/manager/graphql-reference/v2-schema.graphql:2972

  • The generated GraphQL reference removed lastUsedAt from ImageV2MetadataInfo, but the Strawberry schema still defines last_used_at (see src/ai/backend/manager/api/gql/image/types.py:206). This makes the published schema docs inconsistent with the actual API. Please regenerate these schema reference files from the current schema or revert the unrelated removal.
type ImageV2MetadataInfo {
  """Config digest for verification."""
  digest: String

  """Image size in bytes."""
  sizeBytes: Int!

  """Image creation timestamp."""
  createdAt: DateTime

  """Timestamp of the most recent session created with this image."""
  lastUsedAt: DateTime

docs/manager/graphql-reference/supergraph.graphql:5323

  • Same as v2-schema.graphql: lastUsedAt was removed from ImageV2MetadataInfo in the supergraph reference, but the Strawberry schema still exposes it. Regenerate or revert to keep schema references consistent.
type ImageV2MetadataInfo
  @join__type(graph: STRAWBERRY)
{
  """Config digest for verification."""
  digest: String

  """Image size in bytes."""
  sizeBytes: Int!

  """Image creation timestamp."""
  createdAt: DateTime

  """Timestamp of the most recent session created with this image."""
  lastUsedAt: DateTime

  """Parsed tag components."""
  tags: [ImageV2TagEntry!]!

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ai/backend/manager/repositories/deployment/repository.py
Comment thread src/ai/backend/manager/sokovan/deployment/handlers/deploying.py
Comment thread src/ai/backend/manager/sokovan/deployment/handlers/deploying.py Outdated
Comment thread src/ai/backend/manager/sokovan/deployment/handlers/deploying.py
Comment thread src/ai/backend/manager/services/deployment/service.py
Comment thread src/ai/backend/manager/services/deployment/service.py
@jopemachine jopemachine force-pushed the BA-3436-promote-api branch from e3eca7a to 97874ea Compare April 2, 2026 07:36
@github-actions github-actions Bot added size:L 100~500 LoC and removed size:XL 500~ LoC labels Apr 2, 2026
@jopemachine jopemachine force-pushed the BA-3436-promote-api branch from 00d7731 to 3d57556 Compare April 2, 2026 08:12
@github-actions github-actions Bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Apr 2, 2026
jopemachine and others added 8 commits April 15, 2026 14:28
- Add DeployingAwaitingPromotionHandler for AWAITING_PROMOTION sub-step
- Add promoteDeployment GraphQL mutation for manual blue-green promotion
- Add promote_deployment repository method with atomic route switch
- Wire promote through full stack: DTO, Action, Service, Processor, Adapter, GQL
- Add promote_route_ids to RouteChanges for blue-green traffic switch
- Add DEPLOYING_AWAITING_PROMOTION to DeploymentLifecycleSubStep

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: octodog <mu001@lablup.com>
…ns call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mote resolver

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… promote API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… health check comparison

RouteStatus is a lifecycle enum (PROVISIONING, RUNNING, etc.) and does not have a HEALTHY member.
The health check status uses a separate RouteHealthStatus enum with the HEALTHY attribute.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the BA-3436-promote-api branch from 8a02693 to 5ba1db6 Compare April 15, 2026 05:31
jopemachine and others added 3 commits April 16, 2026 10:00
…ontext

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Guard tz-naive phase_started_at when computing auto-promote delay
- Reject manual promotion when no healthy green routes exist

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reject when deployment is not in AWAITING_PROMOTION
- Reject when deploying_revision_id is missing
- Reject when no healthy green routes exist
- Classify HEALTHY green routes as promote and active blue routes as drain

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the BA-3436-promote-api branch from 6e8224f to 2e2c2b7 Compare April 16, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentations comp:common Related to Common component comp:manager Related to Manager component size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants